Architecture Design
Overview
Our data processing and management system integrates several components to effectively handle and transform data from raw inputs to actionable business insights. This paper outlines the architecture and functionality of each key component within the system.
Data Warehouse
Function: Central repository for storing all system data.
Details: All incoming data is systematically stored in tables within the data warehouse, providing a structured and organized storage solution.
Download and Decode Data
Component: Spark Jobs
Function: Download and decode data.
Process:
- Download: Spark jobs download raw data from an RPC service.
- Save: The downloaded raw data is saved to the data warehouse.
- Decode: Raw data is decoded into source data (calls and events) using the ABI protocol, which is stored in the CMS database.
Admin CMS
Function: Management of ABIs and decoded data.
Details: The CMS manages the Application Binary Interfaces (ABIs) of protocols necessary for data decoding. ABI data is stored in the CMS database to ensure accurate and efficient decoding processes.
Data Transformer
Component: Transformer (DBT Project)
Function: Transform source data into business tables.
Role: Similar to the Spellbook on Dune, the Transformer processes and refines source data tables into business-ready tables, enabling meaningful analysis and reporting.
Workflow Orchestration
Component: Airflow
Function: Administration and scheduling.
Role: Airflow handles the scheduling and coordination of tasks across the system, ensuring that data processing workflows are executed efficiently and reliably.
Query Engine
Primary Engine: Trino Query Engine
Function: Execute SQL queries from the BI tool.
Details:
- Infrastructure: Runs on Amazon EMR (Elastic MapReduce) to leverage AWS services for detailed data processing.
- Operation: Receives SQL queries from the BI tool (Metabase), reads data from the data warehouse, performs calculations, and summarizes results to be returned to the BI tool.
Business Intelligence Tool
Component: Metabase
Function: User interface for data analysis and visualization.
Capabilities:
- Querying: Allows users to write SQL queries.
- Visualization: Users can create charts and build dashboards.
- Data Storage: The BI application's data, including user-created charts and dashboards, is stored in a PostgreSQL database.